The ‘T’ is ETL stands for transformation. The goal of transformation is to convert raw input data to an OLAP-friendly data model. This is also known as dimensional modeling.
Microsoft SQL Server Integration Services (SSIS) is a platform for building high-performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. SSIS includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model.
When designing the ETL process it’s good to think about the three fundamental things it needs to do:
An ETL process is a program that periodically runs on a server and orchestrates the refresh of the data in the BI system. SQL Server Integration Services (SSIS) is a development tool and runtime that is optimized for building ETL processes. Learning SSIS involves a steep learning curve and if you have a software development background as I do, you might first be inclined to build your ETL program from scratch using a general-purpose programming language such as C#. However, once you master SSIS you’ll be able to write very efficient ETL processes much more quickly. This is because SSIS lets you design ETL processes in a graphical way (but if needed you can write parts using VB or C#). The SSIS components are highly optimized for ETL type tasks and the SSIS run-time executes independent tasks in parallel where possible. If you’re a programmer you’ll find it amazingly difficult to write your own ETL process using a general-purpose language and make it run more efficiently than one developed in SSIS.
Inclined to build a profession as SSIS Developer? Then here is the blog post on, explore SSIS Training
In SSIS you can design your ETL process using control flows and data flows. Data flows in SSIS are a type of control flow that allows you to extract data from an external data sources, flow that data through several transformations such as sorting, filtering, merging it with other data and converting data types, and finally store the result at a destination, usually a table in the data warehouse. This is very powerful and data flows seem to lend themselves very well for integrating the extract and transformation tasks within them. This is why I call this the “obvious” approach and many tutorials about SSIS follow this approach. The obvious approach seems especially attractive because it is very efficient and there’s no need to store intermediate results.
The figure below illustrates this process:
The top-level control flow in the Integration Services project may look like this:
The “Extract and Transform” box is a sequence container that holds a data flow for each of the tables that will be refreshed in the data warehouse. In this example, there is one fact table and there are three dimension tables. SSIS will execute the data flows in parallel, and when all of them have completed the cube will be processed.
The transformation of data takes place in the data flows.
The transformations needed in each of the data flows would typically look something like this:
You liked the article?
Like: 0
Vote for difficulty
Current difficulty (Avg): Medium
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.